STA 141B PROJECT

DAVID WONG

This project explores more of the covidcast API data with the aim to explore how mask usage and other variables influence covid cases.

We will request the API data, for example, below to see the number of covid cases in the US.

More data will be collected.

We will know plot the broad visualizations of the covid-19 pandemic cases and deaths in the US usnig matplotlib

We will now take a closer look at the demographics of each state and the cumulative number of cases and deaths up to the beginning of June 2021. This will be represented in a bar graph with the number of deaths below the number of cases.

From the above graph, we instantly see some states that stand out. The average cumulative proportionate number of cases per state is around 10,000. The states that are well below the average include: Hawaii, Maine, Oregon, Vermont, and Washington. The states on the opposite end include North Dakota, South Dakota, and Rhode Island. A state of interest that has an exceptionally noticable unusual proportion of deaths to cases is Utah.

We will look at leading vs lagging indicators. Some leading indicators include mask usage, bar and restaurant vists, and covid-like symptons. Some lagging indicators include cases, deaths, and hospital admissions.

First, the nation-wide values will be examined, then we will specify our analysis on groupings of states below average and above average. Utah will be examined seperately.

The data will be modified in order to create an interactive time series of the change in populations of mask-wearing across each state. If mask-wearing does have an infuence on covid cases, there should be some evidence for the change in number of cases as the mask usage changes.

Overall, we see an increase in mask usage over time, with most states consistently having mask usage at 70-90%. We shall modify the data to see separately how the behavior changes in some of the outlier states incuding: North and South Dakota, Hawaii, and Utah.

We now will construct the data to create a time series of the proportionate number of new cases per day by state.

With a visualization of which states have cases been spiking, we have an idea of which states to look out for. Now we will construct scatter plots to compare mask usage with proportionate number of covid like cases in the community.

We will modify the dataframes to plot mask usage and covid-like cases in the community instead of more time series.

We see that there are obvious negative correlations between wearing masks and covid-like cases. This will be further analysed with spearman rank coefficient tests.

By conducting spearman correlation tests, we see that rho is calculated around -0.74, so there is a relatively strong negative correlation between the two variables: the percentage of people wearing masks and people who know someone else who is sick.

For our final analysis, we will merge the dataframes with other signals from the fb-survey, with those signals being the percentage of people who, in the last 24 hours, visited restaurants/bars and those who attended large events with 10+ people. Then spearman correlation tests will be done.

From examining other leading factors, we see that there is a positive correlation between going to a restaurant or large event to knowing someone in the community is sick. There is a strong negative correlation between wearing a mask and going out to events and restaurants.